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Abstract — Fano's inequality reveals the relation between the 
conditional entropy and the probability of error . It has been 
the key tool in proving the converse of coding theorems in the 
past sixty years. In this paper, an extended Fano's inequality 
is proposed, which is tighter and more applicable for codings 
in the finite blocklength regime. Lower bounds on the mutual 
information and an upper bound on the codebook size are also 
given, which are shown to be tighter than the original Fano's 
inequality. Especially, the extended Fano's inequality is tight for 
some symmetric channels such as the q-ary symmetric channels 
(QSC). 

Index Terms — Fano's inequality, finite blocklength regime, 
channel coding, Shannon theory. 

I. Introduction 

As known to all, Shannon's information theory deals mainly 
with the representation and transmission of information. In 
the development of both source and channel coding theorems, 
especially for their converses, Fano's inequality serves as the 
key tool H. 

Theorem 1: Fano's Inequality 

X and Y are two random variables following (X, Y) ^ 
p{x, y) and X, Y e X. Define P e = Pr{X ^ Y}, then 



least one of them is equiprobable. Then the mutual information 
between them satisfies 



H(X\Y) < H(P e ) + P e log(M - 1) 



(1) 



where M = \X\ is the cardinality of X and Y, H(P e ) is the 
binary entropy function H(x) = — [xloga;+(l— x) log(l— x)\ 
for < x < 1. Thus Fano' inequality can be further relaxed 
by H(P e ) < 1. 

Usually, the left hand side of (Q~|) is referred to as the 
equivocation, which is quantified by the conditional entropy. 
Particularly, it represents the uncertainty whether the re- 
stored/decoded message Y is the same as the original one, 
i.e., X. On the other hand, the right hand side implies the 
reliability of the source/channel coding in terms of a function 
of error probability P e . It was shown in y[ that vanishing 
equivocation implies vanishing error probability. However, 
vanishing error probability does not necessarily guarantee a 
vanishing equivocation, especially for some X, Y of countably 
infinite alphabet. 

In proving the converse of coding theorems, one wants 
to find the upper bound on the size of the codebook given 
arbitrary code length and error probability. The following 
theorem is an immediate inference of Fano's inequality, simple 
but useful. 

Theorem 2: |01 Suppose X and Y are two random variables 
that take values on the same finite set with cardinality M and at 



I(X;Y)>(l-P e )logM-H(P e ), 



(2) 

-x) 



where P e = Pr{X ^ Y} and H(x) = -[xlogx + (1 
log(l - a;)]. 

As an inference of this result, the following theorem gives 
an upper bound on the size of a code as a function of the 
average error probability. 

Theorem 3: J^l Every (M, e)-code (average probability of 
error) for a random transformation Py\x satisfies 

logM< 7T ^8wpI(X;Y) + — ^H(e), (3) 



x 

Pr{X ^ Y}, H{x) 



(1-e) 
[xlogx + (1 



x) log(l 



where e 

x)]. 

Although simple, these two theorems are insightful and easy 
to compute both in theory and numerically. 

The classical coding theorems are mainly based on the 
asymptotic equipartition property (AEP) and typical/joint- 
typical decoder [l]. However, applications of AEP requires 
infinite long codewords, where the error probability goes either 
to or 1 as the code length goes to infinity. Although these 
coding theorems provide fundamental limits for modern com- 
munications, research on the finite blocklength coding schemes 
are more important in engineering applications. Given the 
block length, upper bounds on the achievable error probability 
and the achievable code size were obtained in [8]. Most 
importantly, a tight approximation for the achievable maximal 
rate given the error probability and code length was presented. 

In this paper, we consider the entropy of one random 
variable vector conditioned on another, and the correspond- 
ing probability of error in guessing one from the other, by 
proposing an extended Fano's inequality. The extended Fano's 
equality has better performance by taking advantage of a more 
careful consideration on the error patterns. It suits codings in 
the finite blocklength regime better and is useful in bounding 
the mutual information between random vectors, and the 
codebook size given the block length and average symbol error 
probability constraint. 

In the following part of this paper, we present the extended 
Fano's inequality in Section [XT] first. The lower bounds on the 
mutual information between two random variable vectors and 
a upper bound on the codebook size given the block length and 
error probability are given in Section [III] An application of the 



the obtained result to the q-ary symmetric channels (QSC) are 
presented in Section|IV] which shows that the extended Fano's 
inequality is tight for such channels. Finally, we concluded the 
paper in Section [V] Throughout the paper, vectors indicated 
by bold. 

II. Fano's Inequality Extension 

Although Fano's inequality has been used widely in the past 
few years, it can be improved by treating the error events more 
carefully. In this section, a refinement of Fano's inequality 
is presented, which is tighter and more applicable for finite 
blocklength coding design. 

Theorem 4: Fano's Inequality Extension 

Suppose that X = {X 1 ,X 2 ,--- ,X n } and Y = {Y 1 ,Y 2 , 
■ ■ ■ ,Y n } are two n-dimension random vectors where Xf. and 
Y/c (k = 1, 2, • • ■ , n) take values on the same finite set X with 
cardinality \X\ = q. Then the conditional entropy satisfies 



n 

H(X\Y) < H(p) + Y J Pk (C£(« - l) fc ) 



(4) 



where H(p) = — Yl^oPk ^°EPk is the discrete entropy 
function, p = {po,pi, ■ ■ ■ ,p n } is the error distribution, where 
the error probabilities are pf* = Pr (H<i(X, Y) = k) for 
k = 0,1, ■ ■ ■ ,n. Hd(X,Y) is the generalized Hamming 
distance, defined as the number of symbols in X that are 
different from the corresponding symbol in Y. 

Proof: Define the error random variable as E = k if 
Hd{X, Y) = k for k = 0, 1, • ■ • , n. According to the chain 
rule of the joint entropy, H(E,X\Y) can be expressed in the 
following ways, 

H(E,X\Y) = H(X\Y) + H(E\X,Y) (5a) 
= H(E\Y) + H(X\E,Y) (5b) 

Particularly, in (|5]a), it is clear that H(E\X, Y) = 0. Then 
we have 



H(X\Y) =H(E\Y) + H(X\E, Y) 



(a) 



(6) 



<H{E) + H{X\E,Y), 



where (a) follows the fact that entropy increases if its condition 
is removed, i.e., H(E) > H(E\Y). Particularly, we have 
H(E) = H(p) = -ELoPk^gPk- 
According to its definition, we have 



H(X\E, Y) = J2PkH(X\E = k, Y) 



(J) 



k=0 



When considering H(X\E = k, Y), we know that there are 
k disaccord symbol pairs between X and Y. For each fixed 
Y = y, every symbol in X which belongs to a disaccord pair 
has q — 1 possible choices except the one in y. Thus X \ (E = 
k, y) has (q — l) k choices. Besides, there C% selections for 
the positions of error symbols for each given k. Therefore, the 
total number of possible codeword X is C^(q — l) fe , which 
means 

(8) 
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Fig. 1. The q-ary Symmetric Channel 



Particularly, note that H(X\E = 0, Y) = since there is 
no uncertainty in determining X from Y, if they are the same. 
Then, (0 can be written as 



H(X\E, Y)=Y^ P{E = k)H(X\E = k, Y) 

n 

<5> fe log (C^(q-l) k ). 



(9) 



k=l 



By combining (|6]l and (O, the proof of the theorem is 
completed. 

■ 

Remark 1: In fact, the error distribution p is easy to cal- 
culate, especially for some special channels. For example, the 
discrete q-ary symmetric channels is shown in Fig Q] In this 
situation, p k = C k ((q - l)e) fe (l - (q- l)e) n - k . 

Remark 2: It is clear that Theorem |4] is a generalization of 
Fano's inequality. Specifically, when the block length is 1, i.e., 
n = 1, we have p\ = p e , po = 1 — p e , and C\ = 1. In this 
case, Theorem |4] reduces to 



H(X\Y) <H( Pe )+ Pe \og(q-l) 



(10) 



H(X\E = k,Y) <log (Cjto-1) 



which is exactly the same as Fano's inequality. 

As a variant of Theorem [4] the following theorem presents 
the conditional entropy in terms of relative entropy. 

Theorem 5: Suppose that X = {Xi,Xz, ■■■ ,X n } and 
Y = {Yi,Y2, - ■ ■ ,Y n } are two n-dimension random vectors 
where Xk and Yk (k = 1, 2, ■ • • , n) take values on the same 
finite set X with cardinality \X\ = q. Then the conditional 
entropy satisfies 

ff(X|Y)<nlogg-.D(p||g), (11) 

where D{p\\q) = X^fc=o Pk l°g ^ is the discrete rela- 
tive entropy function. The error probabilities are pk = 
Pr (H d (X, Y) = k) for k = 0,1, Donate p = 

{p ,Pi,--- ,p n }, and q = {q ,qi,--- ,q n } is a probability 
distribution with q k = nK ^ n ' 

Proof: Firstly, we know from the binomial theorem that 
{ fcpgys }o<w is a probability distribution. Let a = q — 1 
and b = 1, we know that q is a probability distribution where 

_ C k (q-l) k 

qk - ^ ■ 

According to Theorem |4] 

(a) n 

H(X\Y) < -Y^ Pk log 



Pk 



k=0 



C*(g-l) 



= nlogq - D(p || q) 



where (a) holds because log (C*(q - l) fe ) = for k = 0. ■ 
Remark 3: By the definition of p, it is clear that it reflects 
the error performance of the channel and is totaly determined 
by the channel itself. On the contrary, q is a distribution where 
each error patten is assumed to appear equiprobablely. In this 
situation, the probability that there are k error symbols in 
the codeword is q k = C k {±j-) k {\) n ~ k . Thus D{p\\q) is the 
distance between the actual error pattern distribution and the 
uniform error pattern distribution. Particularly, if the channel 
is an error free one, i.e., po = 1 and pf. = for 1 < k < n, 

we have D(p\\q) = ELoPfc lo gfr = 1 ' log ] ~k = lo S?"- 
According to Theorem [5] we get H(X\Y) = 0, which is 
reasonable with the assumption on the channel. In this sense, 
Theorem [5] is tight. 

The Fano's inequality has been playing an important role 
in the history of information theory because it built a close 
connection between conditional entropy and error probability. 
For extended Fano's inequality given in Theorem |4] it is 
especially applicable in finite blocklength coding. It also 
presents the relationship between conditional entropy and error 
probabilities, which are defined as follows. 

Definition 1: Block error probability is the average error 
probability of a block (codeword), i.e., P& = Yr{X ^ Y}. 
Then we have Pf, = J^k^iPk- Thus, we have Pi, > p k for 
any < k < n. 

Definition 2: Symbol error probability P s is the average 
error probability of a symbol, i.e., P s = Pr{Xk ^ Yfc}, which 
can be expressed by P s = — 2~Zl-=i ^Pk- 

Remark 4: In many communication systems, especially 
those using error correction channel codings, a block error 
doesn't imply a system failure. On the contrary, the error can 
be corrected or part of the block can still be used with some 
performance degradation. In this case, the symbol error is more 
useful than the block error. 

Particularly, a corollary following our result as shown below 
will answer this problem. 

Corollary 1: Suppose that X = {X\,X2, - ■ ■ ,X n } and 
Y = {Yi, Y2, ■ ■ ■ ,Y n } are two n-dimension random vectors 
where Xk and Yj, take values on the same finite set X with 
cardinality \X\ = q. Then the conditional entropy satisfies 

H{X\Y) <n- D(p\\w) + nP s \og(q - 1), (12) 

where w = {wq, Wi, ■ ■ ■ ,w n } is a probability distribution 

with Wk = 2^". 

Proof: According to Theorem |4] one has 

n / pk \ n 

H(X\Y) < H(p) + ^> fc log ^2" + Y, V k log(g - l) fc 

n n n 

= -^f>fclogp fe + ^p fe log^ fc + Y p k log 2 n 

k=0 fc=0 

= n— D(p\\w) + nP s log(g - 1). 

■ 

Remark 5: Since the distribution w can be expressed as 
Wfc = Ck(i) fc (i)™ _fc , which is a binomial distribution with 
the symbol error probability of 0.5. D(p\\w) is a measure of 



the distance between the error probability distribution and the 
binomial distribution with parameter 0.5. 

Remark 6: If one takes n = 1, Corollary Q] will reduces to 
H(X\Y) < 1 + P S \og(q — 1), which is a frequently used form 
of Fano's inequality. 

Remark 7: It is seen that the extended Fano's inequality 
builds a natural connection between conditional entropy and 
symbol error and is especially applicable for finite length 
codings. 

III. Converse Results 

A. Lower Bounds on the Mutual Information 

Based on the proposed generalized Fano's inequality, the 
following lower bounds on the mutual information between 
X and Y can be obtained. 

Theorem 6: Suppose that X = {Xi,X2, ■■■ ,X n } and 

Y = {YijYz,'-- ,Y n } are two n-dimension random vectors 
that satisfy the following. 

1) Xk and Yfc (k = 1,2, ••• ,n) take values on the same 
finite set X with cardinality \X\ = q. 

2) Either X or Y is equiprobable. 

3) The error probabilities are pk = Pr (Hd{X ,Y) = k) 
for k = 0,1, ■ ■ ■ ,n. Donate the error distribution as 

P= {P0,Pl,--- ,Pn}- 

Then the mutual information between X and Y satisfies 

n 

I(X;Y) > nlogq- H(p) -J2Pklog(C k (q ^ l) k ) , (13) 

fe=l 

where H{p) = — Y^=oP k ^°SPk is the entropy function. 
Proof: If X is equiprobable, H(X) = \ogq n = nlogq. 
On the other hand, the mutual information is given by 
I{X; Y) = H(X) - H(X\Y). Together with Theorem!] 

n 

I(X;Y)> nlogq- H(p)-J2Pklog(C k (q-l) k ). (14) 

k=l 

Note that X and Y are totally symmetric in (fT4l . Therefore, 
if Y is assumed to be equiprobable at the beginning of the 
proof, one can get the same result. Thus Theorem |6]is proved. 

■ 

By using Theorem [5] the mutual information between X 
and Y can be bounded by the following Corollary. 

Corollary 2: Suppose that X = {X\,X2, - ■ ■ ,X n } and 

Y = {Yi, Y2, • • • ,Y n } are two n-dimension random vectors 
that satisfy the following. 

1) Xk and Yk (k = 1, 2, • • • ,n) take values on the same 
finite set X with cardinality \X\ = q. 

2) Either X or Y is equiprobable. 

3) The error probabilities are pk = Pr (Hd{X, Y) = k) 
and p = {p ,p!, ■ ■ ■ ,p n }. 

Then the mutual information between X and Y satisfies 

I{X;Y)>D(p\\q) (15) 

where D(p\\q) = X^o^- log is the discrete relative 
entropy function and q = {qo,q\,--- ,q n } is a probability 

C k (a— l) h 

distribution with qk = '" „ 



The distribution q means that the symbol in Y takes any 
value on B with equal probability, regardless of what is sent 
in X. So it is a pure random distribution when X and Y 
are independent from each other. The most desirable coding 
scheme is that its error distribution p is farthermost from q, 
which also ensures a larger coding rate. 

B. Upper Bounds on the Codebook Size 

Suppose X is a finite alphabet with cardinality \X\ = q. 
Let's consider the input and output alphabets A n = B n C X n 
with \A' l \ = \B n \ = AI and a channel to be a sequence of 
conditional probabilities {Py\x ■ A n n- B n }. We donate 
a codebook with M codewords by (Xi, X2, ■ ■ ■ ,Xm) S 
A n . A decoder is a random transformation Pz\y '■ B n ^ 
{0, 1, 2, • • • , AI} where indicates that the decoder choose 
error. If messages are equiprobable, the average error proba- 
bility is defined as Pb = 1 — jjPz\x( m \X m ). An codebook 
with AI codewords and a decoder whose average probability 
of error is smaller than e are called an (n, M, e)-code. 

An upper bound on the size of a code as a function of the 
average probability of symbol error follows the Corollary |T| 

Theorem 7: Every (n, M, e)-code for a random transforma- 
tion Pz\y satisfies 

logM < sup/(X; Y) - D(p\\w)+n(l + P s log(q - 1)) 
x 

(16) 

where p = {poiPir" ,p n } is the error distribution with 
Pk = Pr (H d (X, Y) = k) for k = 0, 1, • • • ,n, w = {w a , w 1 , 
■ ■ ■ , w n } is a probability distribution with Wk = 
Proof: 

Since the messages are equiprobable, we have H(X) = 
log AI. According Corollary Q] 

I(X-Y) =H(X)-H(X\Y) 

> log M - n + D(p\\w) - nP s log(g - 1) 

Solving log M from H7i . one can get ( TTol l. which completes 
the proof. ■ 

IV. Application to Channel Coding 

Consider information transmission over a memoryless dis- 
crete q-aiy symmetric channel with a channel code (n, M, e) 
with crossover probability e, as shown in Fig. Q] In this case, 
the probability of symbol error is p e = (q — l)e. 

Then the error probabilities are 



(17) 



(18) 



p k = Pv{H d (X,Y) = k} = Clp\(\-p e ) n ~ k 
and the block error probability is 

A = l-P0 = l-(1-Pe) n . (19) 

Using the extended Fano's inequality in Theorem|4] we have 

'C k n {q-\r 



H e (X\Y) <^p fc log 



k=0 



Pk 



(20) 



It is easy to see that the conditional entropy in theory is 
H(X\Y) = H(l-p e ,e,--- ,e) 



(1 -Pe)log(l-Pe) - (q ~ l)l0g£. 



(21) 



By Corollary |2] mutual information is lower bounded by 

'c* P h e (i- Pe r- k ' 



I e (X-Y) >^ Pfc log 



(C*(g-l)*)/g" 



= n log q + 22 Pk [k logpe + (n - k) log (1 — p e ) 

k = 

-fclog(g-l)]. 
while the capacity of the memoryless QSC is given by 

I(X;Y)=logq-H(l-p e ,e,-.- ,e) 
= \ogq+ (1 -p e )log(l -p e ) + (q- l)eloge. 

And the relative entropy D(p\\w) can be derived as 

'C k e k {l -(q- l)e) n - k 



(22) 



(23) 



D(p\\w) = ^pfclog 



k=0 



CUT 



(24) 



=n + V^Pfc [A:logp e + (n - A;) log(l - p e )] 



fc=0 



For a given average symbol error probability constraint 
P s = e, the upper bound on the maximum codebook size 
given by Theorem [7] is 

log M e <I{X; Y) - D(p\\w) +n(l + P 3 log(q - 1)) 

n 

=nlogq - nH(l - p e ,e, ■■■ ,e) -^p fe [fclogp e (25) 
+ (n - k) log(l - p e )] + ?ielog(g - 1). 

On the other hand, by Fano's inequality we have 

H f (X\Y) < H(P b ) + P b \og(q n - 1) (26) 

with Pb given by ( fT9l . 

Then the lower bound of the mutual information is 

I f (X;Y)>(l-P b )\ogq n -H(P b ). (27) 

Finally, the upper bound on the codebook size is 



1 



l 0gMf <-—( n I(X;Y) + H(e)) 



(28) 



Suppose the QSC parameter are e = 0.001 and q = 7, 
we calculated the bounds on conditional entropy, mutual 
information and codebook size by our proposed results and 
Fano's inequality. 

Firstly, the upper bound on the conditional entropy is 
presented in Fig. [2] Specially, H e (X\Y) is obtained by the ex- 
tended Fano's inequality ( l20l i. Ht(X\Y) is calculated accord- 
ing to Fano's inequality d26l i and H(X\Y) is the conditional 
entropy in theory (ETV It is clear that Theorem 2] is tighter 
than Fano's inequality. Particularly, we have H e (X\Y) = 
H(X\Y) for the QSC. This is because the error distributions 
are the same for any Y = y. So H(E) = H{E\Y) 
holds. Besides, the error pattern is uniformly distributed for 
a given k, regardless of y and H{E) = H(Y\E,Y) = 
H(Y\E) = logC k (q-l) k holds. Therefore, the upper bound 
is tight. However, for Fano's equality, there are relaxations 
in both H(E) = H(E\Y) to H(E) and H(X\E,Y) to 
P b log(A/ - 1). 



I 
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Fig. 2. The upper bound on conditional entropy, q = 7, e = 0.001 
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Fig. 3. Bounds on mutual information and codebook size v.s. blockleng 
q = 7,e = 0.001 



Similarly, the lower bound on mutual information I e (X; Y) 
given by (1221 coincides with I(X; Y) in theory, given by (l23l 
and is better than that given by Fano's inequality (|27| |. 

When we use the upper bound on the codebook size in 
Theorem|7] it should be noted that it is presented as a function 
of symbol error probability P s . In fact, Pj, is always larger 
than P s . Therefore, we use the same fraction of them in the 
calculation of the bounds to make sense of the comparison, 
i.e., e = ^ for (l25l l and e = 2 f° r <ES)- It is also seen from 
Fig. [3] that our new developed result is tighter. 

The performances of Theorem |6] and Theorem [7] versus the 
QSC parameter e are shown in Fig. [4] where the block length 
is chosen as n = 30. As shown, the mutual information bound 
is tight and our results are much better than Fano's inequality. 
In the calculation of the upper bounds on codebook size, the 
selection of the error probability constraints are also chosen 



as e R = 



and ef = 



n(e) 



so that they are comparable. 



V. Conclusion 



In this paper, we revisited Fano's inequality and extended 
it to a general form. Particularly, the relation between the 
conditional entropy and error probability of two random 
vectors was considered, other than that between two random 
variables. This makes the developed results more suitable 




0.005 0.01 0.015 0.02 
QSC parameter e 



0.005 0.01 0.015 0.02 
QSC parameter a 



Fig. 4. Bounds on mutual information and codebook size v.s. QSC parameter 

s, q = 7, n = 30 



for source/channel codings in the finite blocklength regime. 
By investigating the block error pattern more detailedly, the 
conditional entropy of the original random vector given the 
received one is upper bounded more tightly by the extended 
Fano's inequality. Furthermore, the extended Fano's inequality 
is completely tight for some symmetric channels such the q- 
ary symmetric channels. Converse results are also presented in 
terms of lower bounds on the mutual information and a upper 
bound on the codebook size under the blocklength and symbol 
error constraints, which also have better performances. 
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